Compression of DNA Sequences

نویسندگان

  • Stéphane Grumbach
  • Fariza Tahi
چکیده

We propose a lossless algorithm to compress the information contained in DNA sequences. None of the available universal algorithms compress such data. This is due to the speciicity of genetic information. Our method is based on regularities, such as the presence of palindromes, in the DNA. The results we obtain, although not satisfactory, are far beyond classical algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genbit Compress Tool(GBC): A Java-Based Tool to Compress DNA Sequences and Compute Compression Ratio(bits/base) of Genomes

We present a Compression Tool , GenBit Compress”, for genetic sequences based on our new proposed “GenBit Compress Algorithm”. Our Tool achieves the best compression ratios for Entire Genome (DNA sequences) . Significantly better compression results show that GenBit compress algorithm is the best among the remaining Genome compression algorithms for non-repetitive DNA sequences in Genomes. The ...

متن کامل

A Biological sequence Compression based on Look up Table (LUT) using Complementary Palindrome of Fixed size

Data Storage costs have an appreciable proportion of total cost in the creation and analysis of DNA sequences. In particular, the increase in the DNA sequences is highly remarkable with compare to increase in the disk storage capacity. General text compression algorithms do not utilize the specific characteristics of DNA sequences. In this paper we have proposed a compression algorithm based on...

متن کامل

Dna Compression Using Hash Based Data Structure

DNA Sequences making up any organism comprise the basic blueprint of that organism so that understanding and analyzing different genes within sequences has become an extremely important task. Biologists are producing huge volumes of DNA sequences every day that makes genome sequence database growing exponentially. The databases such as EMBL, GenBank represent millions of DNA sequences filling m...

متن کامل

Biological sequence compression algorithms.

Today, more and more DNA sequences are becoming available. The information about DNA sequences are stored in molecular biology databases. The size and importance of these databases will be bigger and bigger in the future, therefore this information must be stored or communicated efficiently. Furthermore, sequence compression can be used to define similarities between biological sequences. The s...

متن کامل

Grammar-based Compression of DNA Sequences

Grammar-based compression algorithms infer context-free grammars to represent the input data. The grammar is then transformed into a symbol stream and finally encoded in binary. We explore the utility of grammar-based compression of DNA sequences. We strive to optimize the three stages of grammar-based compression to work optimally for DNA. DNA is notoriously hard to compress, and ultimately, o...

متن کامل

DNABIT Compress – Genome compression algorithm

Data compression is concerned with how information is organized in data. Efficient storage means removal of redundancy from the data being stored in the DNA molecule. Data compression algorithms remove redundancy and are used to understand biologically important molecules. We present a compression algorithm, "DNABIT Compress" for DNA sequences based on a novel algorithm of assigning binary bits...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993